Multi-view Semi-supervised Learning: An Approach to Obtain Different Views from Text Datasets

نویسندگان

  • Edson Takashi Matsubara
  • Maria Carolina Monard
  • Gustavo E. A. P. A. Batista
چکیده

The supervised machine learning approach usually requires a large number of labelled examples to learn accurately. However, labelling can be a costly and time consuming process, especially when manually performed. In contrast, unlabelled examples are usually inexpensive and easy to obtain. This is the case for text classification tasks involving on-line data sources, such as web pages, email and scientific papers. Semi-supervised learning, a relatively new area in machine learning, represents a blend of supervised and unsupervised learning, and has the potential of reducing the need of expensive labelled data whenever only a small set of labelled examples is available. Multi-view semi-supervised learning requires a partitioned description of each example into at least two distinct views. In this work, we propose a simple approach for textual documents pre-processing in order to easily construct the two different views required by any multi-view learning algorithm. Experimental results related to text classification are described, suggesting that our proposal to construct the views performs well in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intra-View and Inter-View Supervised Correlation Analysis for Multi-View Feature Learning

Multi-view feature learning is an attractive research topic with great practical success. Canonical correlation analysis (CCA) has become an important technique in multi-view learning, since it can fully utilize the inter-view correlation. In this paper, we mainly study the CCA based multi-view supervised feature learning technique where the labels of training samples are known. Several supervi...

متن کامل

Active + Semi-supervised Learning = Robust Multi-View Learning

In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated (i.e., every example is identically labeled by the target concepts in eac...

متن کامل

Learning with Low-Quality Data: Multi-View Semi-Supervised Learning with Missing Views

The focus of this thesis is on learning approaches for what we call “low-quality data” and in particular data in which only small amounts of labeled target data is available. The first part provides background discussion on low-quality data issues, followed by preliminary study in this area. The remainder of the thesis focuses on a particular scenario: multi-view semi-supervised learning. Multi...

متن کامل

Two-View Label Propagation to Semi-supervised Reader Emotion Classification

In the literature, various supervised learning approaches have been adopted to address the task of reader emotion classification. However, the classification performance greatly suffers when the size of the labeled data is limited. In this paper, we propose a two-view label propagation approach to semi-supervised reader emotion classification by exploiting two views, namely source text and resp...

متن کامل

Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005